-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Compatibility] Add String#byteindex and String#byterindex #3043
[Compatibility] Add String#byteindex and String#byterindex #3043
Conversation
657d2a8
to
7f21400
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Also it will be useful to run these Ruby 3.2 related specs on CI. To do so the spec files are supposed to be listed in the |
7f21400
to
61b5b60
Compare
ccb2732
to
2cfaa3a
Compare
Done |
2cfaa3a
to
4ff1b34
Compare
4ff1b34
to
63e2718
Compare
finish_adjusted = Primitive.byte_index_to_character_index(self, finish) | ||
finish_adjusted += str.size | ||
finish_adjusted = size if finish_adjusted > size | ||
finish = Primitive.character_index_to_byte_index(self, finish_adjusted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be something like finish += str.bytesize
?
I don't understand why this is necessary or what it does though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I could have made the comment above better. Let's say the call is
("x" * 10).byterindex("xxx", 5)
Ruby will start the lookup matching the pattern on index 5:
xxxxxxxxxx
xxx
StringByteReverseIndexNode
on the other hand is matching the pattern at the end (non inclusive) on index 2:
xxxxxxxxxx
xxx
What makes this offset adjustment non trivial is the difference in encoding. Since str
might have different char length, finish += str.bytesize
can fall on a non codepoint boundary. Conceptually we need to adjust str.size
characters in bytes. Is there a better way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ruby will start the lookup matching the pattern on index 5:
These docs are pretty confusing: https://docs.ruby-lang.org/en/master/String.html#method-i-byterindex
Integer argument offset, if given and non-negative, specifies the maximum starting byte-based position in the string to end the search:
But indeed:
> ("x" * 10).byterindex("xxx", 10)
=> 7
> ("x" * 10).byterindex("xxx", 9)
=> 7
> ("x" * 10).byterindex("xxx", 8)
=> 7
> ("x" * 10).byterindex("xxx", 7)
=> 7
> ("x" * 10).byterindex("xxx", 6)
=> 6
> ("x" * 10).byterindex("xxx", 5)
=> 5
> ("x" * 10).byterindex("xxx", 4)
=> 4
So conceptually it's like a maximum limit to the returned value.
In my understanding, adding finish = Primitive.min(finish + str.bytesize, self.bytesize)
is correct then.
can fall on a non codepoint boundary.
Why does it matter? I think if TruffleString can handle it we can just ignore that.
If it can't we could keep increasing finish by 1 until that position is a character head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it matter? I think if TruffleString can handle it we can just ignore that.
I had the assumption it matters. If we don't care and the code neither (tested, it doesn't), than I guess it's fine. Fixed.
de5d8a7
to
7eb9724
Compare
Add String#byterindex Add tests
7eb9724
to
45d9756
Compare
Source: #3039
String#byteindex and String#byterindex have been added. [Feature #13110]